PROJECT DESIGN

This case assumes that we are a real estate company that invests in large cities by buying properties to later rent them out as tourist apartments.

The management has made the decision to invest in Madrid, and has asked us to analyze the data that the industry leader AirBnb makes public to try to find the types of properties that have greater commercial potential for tourist rental.

The main deliverable is the property classification that the valuation team should look for among the existing opportunities in the city and the main neighborhoods or geographic areas to focus on.

GOAL

Find the profile(s) of properties that maximize the commercial potential in the tourist rental market and the main areas to look for them.

KPIs

In this project we use the following KPIs:

Starting questions

On the rental price:

On occupancy:

On the purchase price:

PRELIMINARY FILE ANALYSIS AND CASE PLANNING

We will use real data offered by AirBnB: http://insideairbnb.com/get-the-data.html

LOAD PACKAGES

UNDERSTANDING THE FILES

From following website, we can extract the data from AirBnB:

http://insideairbnb.com/get-the-data.html

The description of each table is below.

After an overview of all tables, we work just with listings.csv and listings.csv.gz

ANALYTICAL DATAMART

In this phase the data quality process is carried out and afterwards all data sources are put together in one table.

DATA IMPORT

Internal data

Load tables

External data

In our data we do not have the purchase price of the properties. Therefore, we are going to look for this data externally.

https://www.immopreise.at/Wien/Wohnung/Eigentum

How to extract data from a website: https://github.com/blancaRE/ds_portfolio/tree/main/Web%20scrapping

We load the data and rename the columns.

Replace the unknown values with the average price.

The prices variables are an objtect. it should be a numeric variable.

Data quality

Listings table

Overview

Variables and types

We will delete those variables that we do not directly need for our goal.

Review the variable types

Covert object to category (objects are not efficient, it takes up a lot of memory space)

Check

Analysis of null values

The name of the property is not a crucial variable. We leave it as it is.

Analysis of duplicates

Check for duplicate records

Categorical variables analysis

There are characters not porperly recognized. We will solve it later.

We realize that there are hotels. Our company does not plan to buy hotels, so we have to delete these records.

Numerical variables analysis

Conlcusions:

We review price lows and highs

Check highs

Check data close to zero.

Conclusion:

Listings_det table

Overview

Variables and data types

We select the variables relevant for our goal

Variable type analisys

Analysis of null values

Conclusions:

Number of beds

First, we estimate the number of beds and afterwards the number of bedrooms.

Try to make an assignment of beds number based on the number of people that can be accommodated.

Observing the abova table, we perform the following approcah:

Null values count and frequency of each value

We create a function to set the nulls of beds depending on accommodates.

Check

Number of bedrooms

We estimate the number of bedrooms. A priori, we could estimate it with either the variable accomodates or beds.

Since the appartament owner may decide to place 10 people in a single room, we consider variable beds to be more reliable

We perform the following assigment:

We create and execute the function to assign values to nulls:

Check

We delete the bathrooms variable.

Analysis of duplicates

We check if there are any duplicate records

Numerical variables analysis

No unusual findings.

ANALYTICAL DATAMART

We have two tables

Both share the id field, so we can merge them.

Aditionally, we have the price_m2 table that we should merge according to the district variable.

Now let's see how we can incorporate the external information of the price per square meter.

To do this the first step is to analyze the values of the variable district in both tables, since they need to match so that we can cross them.

There are some differences in the german characters.

We delete the spaces.

Now we can integrate it to the main table.

Check is some nulls were generated.

DATA PREPARATION

TRANSFROM VARIABLES

Check variables for KPIs

We have identified the necessary KPIs for the analysyi. We check if we have all of them in the proper format.

Price

The documentation does not clarify whether the price is for the entire property, or if a room is rented on a per room basis.

This is a key data to be able to make the valuation of the potential income of a property.

Let's try to understand it by analyzing the average price by type of rental.

It is important to filter by only one district in order not to include the "area" effect.

So first we choose a district that has a lot of data.

Conclusion:

However, multiplying the price by the total number of rooms may result into an overestimation of occupancy, since all rooms of an appartaments will not be reted at the same time.

Therefore we should weight it by the average percentage of rooms rented. We do not have that data, but let's assume that we have talked to the business manager and he has told us that it is 70%.

We can create the total price variable by applying apply on a custom function.

Check

Occupancy

The variable we have that allows us to measure this is availability_365.

This variable tells us the number of days a year ahead that the property is NOT occupied. Therefore we would be interested in transforming it to a more direct measure of occupancy, for example the % of the year that it IS occupied.

Transformation of variables

We are going to discretize and analyze some of the variables: accommodates, bedrooms and beds.

Discretize bedrooms

First we evaluate the distribution of the bedrooms.

Vamos a discretizar para 1,2,3 y más de 3. We discretize one, two, three or more than three bedrooms.

Check

Discretizae accommodates and beds

We discretize the values according the percentiles.

Discretize beds

Creation of variables with external data

We know the purchase price per squared meter depending on the number of squred meters and district of the poroperty.

We also know the rental price according to the number of beds.

In order to find the square meters of the rental properties, first we estimate the number of squared meters with the number of rooms. After that, we compute the purchase price of the property, multiplying the squared meetrs with its price per squared meter.

Estimation of the square meters of the property

We assume the squared meters according to the number of rooms:

Check

Now we can estimate the purchase price of the property.

We recall that we took 25% off the price that we get for negotiation capacity.

Check

Analysis and insights

Price analysis

There is an outlier in the upper part. We select the median instead of the mean to ignore its effect.

All values seem reasonable.

The Innere Stadt district has the highest rental and purchase prices. We zoom in the other districts.

Conclusion:

With a similar purchase price, there are district with a hiogher rental price. For example, Mariahilfer has a similar purchase price of Josefstadt, but the profitability is higher.

With an elevated budget, could pay of to buy several properties in the culter of districts rather than one dindle property in Innere Stadt.

What factors (other than location) determine the rental price?

We analize each variable of the minicube.

Conclusion: one bedroom is not optimal.

Conclusion: the optimal number of accomodates is 3.

Conclusion: the rental price (blue) grows with the numer of beds. The purchase price (green) also grows, but the profitability is smaller when the number of beds is three.

Geographical analysis

We plot the rental price per property on a map.